Acquiring a Disambiguation Model For Discourse Connectives
نویسندگان
چکیده
Discourse connectives can show sense ambiguities, in that they can signal more than one possible rhetorical relation. The aim of this study is discover how to disambiguate such discourse connectives using a statistical model. Six discourse connectives (after, as soon as, before, once, since and while) which show ambiguities in the sdrt (Segmented Discourse Representation Theory (Asher & Lascarides, 2003)) relation that they signal are considered. Maximum entropy based models using different combinations of linguistic features derived from the connective’s context are trained and tested on a corpus of examples containing these connectives, which has been annotated with the correct rhetorical relation. The best performing model achieves an average of 70.4% accuracy across all the connectives, as compared to a most common sense baseline of 57.2%. There is a wide variation in performance between the different connectives, with the models for since and while at 30 percentage points above the baseline, and the models for after and as soon as failing to beat the baseline by a statistically signficant margin. The most informative features in the model were found to be those derived from the main verbs in the text spans connected by the rhetorical relation, and the words and parts of speech collocated with the connective.
منابع مشابه
Automatic Disambiguation of French Discourse Connectives
Discourse connectives (e.g. however, because) are terms that can explicitly convey a discourse relation within a text. While discourse connectives have been shown to be an effective clue to automatically identify discourse relations, they are not always used to convey such relations, thus they should first be disambiguated between discourse-usage and non-discourse-usage. In this paper, we inves...
متن کاملExperiments on Sense Annotations and Sense Disambiguation of Discourse Connectives
Discourse connectives can be analyzed as discourse level predicates which project predicate-argument structure on a par with verbs at the sentence level. The Penn Discourse Treebank (PDTB) reflects this view in its design providing annotation of the discourse connectives and their arguments. Like verbs, discourse connectives have multiple senses. We present a set of manual sense annotation stud...
متن کاملMultilingual Annotation and Disambiguation of Discourse Connectives for Machine Translation
Many discourse connectives can signal several types of relations between sentences. Their automatic disambiguation, i.e. the labeling of the correct sense of each occurrence, is important for discourse parsing, but could also be helpful to machine translation. We describe new approaches for improving the accuracy of manual annotation of three discourse connectives (two English, one French) by u...
متن کاملMachine Translation of Labeled Discourse Connectives
This paper shows how the disambiguation of discourse connectives can improve their automatic translation, while preserving the overall performance of statistical MT as measured by BLEU. State-of-the-art automatic classifiers for rhetorical relations are used prior to MT to label discourse connectives that signal those relations. These labels are used for MT in two ways: (1) by augmenting factor...
متن کاملDisambiguating Temporal–Contrastive Discourse Connectives for Machine Translation
Temporal–contrastive discourse connectives (although, while, since, etc.) signal various types of relations between clauses such as temporal, contrast, concession and cause. They are often ambiguous and therefore difficult to translate from one language to another. We discuss several new and translation-oriented experiments for the disambiguation of a specific subset of discourse connectives in...
متن کامل